Goto

Collaborating Authors

 Hepatology


Supplementary Materials for M

Neural Information Processing Systems

A.1 Extraction Process As mentioned in Section 2, the patient notes used for M The motivation of this track was to challenge participants to obtain relevant articles that can help answer potential questions for a particular patient note. The patient notes 2014 and 2015 are synthetic patient notes hand-written by individuals with medical training, but the 2016 dataset consists of real patient summaries coming from electronic health records. TREC Clinical Trials 125 137.7 This track consists of 125 patient notes, where 50 notes are from the year of 2021 and 75 notes are from the year of 2022. This track was meant to have participants retrieve previous clinical trials from ClinicalTrials.gov that best match the symptoms described in the patient note. The notes from both tracks are synthetic notes written by individuals with medical training meant to simulate an admission statement from an electronic health record (EHR). MedQA-USMLE 12,893 135.8 Questions from a multiple-choice form a professional medical board exam which can include patient summaries and will ask questions about particular issues based on the summary. We used GPT-3.5-turbo to identify eligible patient notes for each calculator. We did this by shortlisting the notes that had at least one relevant required for a given calculator. We kept the notes that had all of the numeric parameters needed for each calculator. These notes were also filtered to have enough categorical variables inferred such that 50% of the total number of attributes were present for a given patient note. The attribute extractions of these notes were then verified for by authors of this paper. Extraction 1 Prompt: For more details on step 1, we provided a set of 32 parameters which cover at least one attribute needed for each of the 55 calculators. For each note in Open-Patients, we applied the prompt shown below to determine which of the 32 parameters could be extracted from each note.



A Diagnosis and Treatment of Liver Diseases: Integrating Batch Processing, Rule-Based Event Detection and Explainable Artificial Intelligence

arXiv.org Artificial Intelligence

Liver diseases pose a significant global health burden, impacting many individuals and having substantial economic and social consequences. Rising liver problems are considered a fatal disease in many countries, such as Egypt and Moldova. This study aims to develop a diagnosis and treatment model for liver disease using Basic Formal Ontology (BFO), Patient Clinical Data (PCD) ontology, and detection rules derived from a decision tree algorithm. For the development of the ontology, the National Viral Hepatitis Control Program (NVHCP) guidelines were used, which made the ontology more accurate and reliable. The Apache Jena framework uses batch processing to detect events based on these rules. Based on the event detected, queries can be directly processed using SPARQL. We convert these Decision Tree (DT) and medical guidelines-based rules into Semantic Web Rule Language (SWRL) to operationalize the ontology. Using this SWRL in the ontology to predict different types of liver disease with the help of the Pellet and Drools inference engines in Protege Tools, a total of 615 records were taken from different liver diseases. After inferring the rules, the result can be generated for the patient according to the rules, and other patient-related details, along with different precautionary suggestions, can be obtained based on these results. These rules can make suggestions more accurate with the help of Explainable Artificial Intelligence (XAI) with open API-based suggestions. When the patient has prescribed a medical test, the model accommodates this result using optical character recognition (OCR), and the same process applies when the patient has prescribed a further medical suggestion according to the test report. These models combine to form a comprehensive Decision Support System (DSS) for the diagnosis of liver disease.


STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics

Neural Information Processing Systems

Recent advances in multi-modal algorithms have driven and been driven by the increasing availability of large image-text datasets, leading to significant strides in various fields, including computational pathology. However, in most existing medical image-text datasets, the text typically provides high-level summaries that may not sufficiently describe sub-tile regions within a large pathology image. For example, an image might cover an extensive tissue area containing cancerous and healthy regions, but the accompanying text might only specify that this image is a cancer slide, lacking the nuanced details needed for in-depth analysis. In this study, we introduce STimage-1K4M, a novel dataset designed to bridge this gap by providing genomic features for sub-tile images. STimage-1K4M contains 1,149 images derived from spatial transcriptomics data, which captures gene expression information at the level of individual spatial spots within a pathology image. Specifically, each image in the dataset is broken down into smaller sub-image tiles, with each tile paired with 15, 000 30, 000 dimensional gene expressions. With 4, 293, 195 pairs of sub-tile images and gene expressions, STimage-1K4M offers unprecedented granularity, paving the way for a wide range of advanced research in multi-modal data analysis an innovative applications in computational pathology, and beyond.


Identifying Critical Phases for Disease Onset with Sparse Haematological Biomarkers

arXiv.org Artificial Intelligence

Routinely collected clinical blood tests are an emerging molecular data source for large-scale biomedical research but inherently feature irregular sampling and informative observation. Traditional approaches rely on imputation, which can distort learning signals and bias predictions while lacking biological interpretability. We propose a novel methodology using Graph Neural Additive Networks (GNAN) to model biomarker trajectories as time-weighted directed graphs, where nodes represent sampling events and edges encode the time delta between events. GNAN's additive structure enables the explicit decomposition of feature and temporal contributions, allowing the detection of critical disease-associated time points. Unlike conventional imputation-based approaches, our method preserves the temporal structure of sparse data without introducing artificial biases and provides inherently interpretable predictions by decomposing contributions from each biomarker and time interval. This makes our model clinically applicable, as well as allowing it to discover biologically meaningful disease signatures.


Exploration of Hepatitis B Virus Infection Dynamics through Virology-Informed Neural Network: A Novel Artificial Intelligence Approach

arXiv.org Artificial Intelligence

In this work, we introduce Virology-Informed Neural Networks (VINNs), a powerful tool for capturing the intricate dynamics of viral infection when data of some compartments of the model are not available. VINNs, an extension of the widely known Physics-Informed Neural Networks (PINNs), offer an alternative approach to traditional numerical methods for solving system of differential equations. We apply this VINN technique on a recently proposed hepatitis B virus (HBV) infection dynamics model to predict the transmission of the infection within the liver more accurately. This model consists of four compartments, namely uninfected and infected hepatocytes, rcDNA-containing capsids, and free viruses, along with the consideration of capsid recycling. Leveraging the power of VINNs, we study the impacts of variations in parameter range, experimental noise, data variability, network architecture, and learning rate in this work. In order to demonstrate the robustness and effectiveness of VINNs, we employ this approach on the data collected from nine HBV-infceted chimpanzees, and it is observed that VINNs can effectively estimate the model parameters. VINNs reliably capture the dynamics of infection spread and accurately predict their future progression using real-world data. Furthermore, VINNs efficiently identify the most influential parameters in HBV dynamics based solely on experimental data from the capsid component. It is also expected that this framework can be extended beyond viral dynamics, providing a powerful tool for uncovering hidden patterns and complex interactions across various scientific and engineering domains.


BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning

arXiv.org Artificial Intelligence

The applications of large language models (LLMs) in various biological domains have been explored recently, but their reasoning ability in complex biological systems, such as pathways, remains underexplored, which is crucial for predicting biological phenomena, formulating hypotheses, and designing experiments. This work explores the potential of LLMs in pathway reasoning. We introduce BioMaze, a dataset with 5.1K complex pathway problems derived from real research, covering various biological contexts including natural dynamic changes, disturbances, additional intervention conditions, and multi-scale research targets. Our evaluation of methods such as CoT and graph-augmented reasoning, shows that LLMs struggle with pathway reasoning, especially in perturbed systems. To address this, we propose PathSeeker, an LLM agent that enhances reasoning through interactive subgraph-based navigation, enabling a more effective approach to handling the complexities of biological systems in a scientifically aligned manner. The dataset and code are available at https://github.com/zhao-ht/BioMaze.


Multi Agent based Medical Assistant for Edge Devices

arXiv.org Artificial Intelligence

Large Action Models (LAMs) have revolutionized intelligent automation, but their application in healthcare faces challenges due to privacy concerns, latency, and dependency on internet access. This report introduces an ondevice, multi-agent healthcare assistant that overcomes these limitations. The system utilizes smaller, task-specific agents to optimize resources, ensure scalability and high performance. Our proposed system acts as a one-stop solution for health care needs with features like appointment booking, health monitoring, medication reminders, and daily health reporting. Powered by the Qwen Code Instruct 2.5 7B model, the Planner and Caller Agents achieve an average RougeL score of 85.5 for planning and 96.5 for calling for our tasks while being lightweight for on-device deployment. This innovative approach combines the benefits of ondevice systems with multi-agent architectures, paving the way for user-centric healthcare solutions.


Machine Learning Applications to Diffuse Reflectance Spectroscopy in Optical Diagnosis; A Systematic Review

arXiv.org Artificial Intelligence

Its noninvasive nature and sensitivity to absorption related to tissue biomolecular content and scattering change, associated with subcellular morphology, make it an extremely powerful tool to analyse tissue composition, microstructure or oxygenation status, offering promising performance in applications such as cancer diagnostics and surgical guidance [1, 30, 85, 121]. DRS signals are measured by delivering a typically white light source into the tissue and detecting diffusely reflected signals at a certain distance from the source, where the distance between the emitting and receiving fibres determines the tissue depth probed. Depending on the application and clinical objective, multiple illumination or detection fibres can be used to obtain more quantitative information and probe different depths. The light delivery and collection from tissue are often handled using optical fibres or fibre bundles. When incident on the tissue, the light undergoes scattering and absorption processes, which alter the light intensity across the measured spectrum [75, 121].


Federated Variational Inference for Bayesian Mixture Models

arXiv.org Machine Learning

We present a federated learning approach for Bayesian model-based clustering of large-scale binary and categorical datasets. We introduce a principled 'divide and conquer' inference procedure using variational inference with local merge and delete moves within batches of the data in parallel, followed by 'global' merge moves across batches to find global clustering structures. We show that these merge moves require only summaries of the data in each batch, enabling federated learning across local nodes without requiring the full dataset to be shared. Empirical results on simulated and benchmark datasets demonstrate that our method performs well in comparison to existing clustering algorithms. We validate the practical utility of the method by applying it to large scale electronic health record (EHR) data.